Contents II Cache Awareness in Blocking Techniques 76 8
نویسندگان
چکیده
To date, data locality optimizing algorithms mostly aim at providing e cient strategies for blocking and reordering loops. But little research has been devoted to the nal step, i.e., computing the optimal block size. Optimal block sizes are currently computed as if a cache behaves as a local memory, i.e., cache interference phenomena are ignored. Case-studies have already shown that cache interferences can greatly a ect the optimal block size. The purpose of this paper is to propose a methodology for estimating interference misses in a regular do-loop nest, and use that knowledge to derive the optimal block size. First, the di erent types of interference phenomena are identi ed, and a method for predicting their occurrence and evaluating their impact is proposed. Second, current techniques for computing the optimal block size are analytically and experimentally shown to yield far below optimal performance. Third, cache interference phenomena and even TLB behavior are taken into account in the computation of the optimal block size, which proves to yield near-optimal performance, and consequently make blocking techniques safe. Reciprocally, it is also shown that even when no capacity miss occurs, blocking techniques can be used to signi cantly reduce the number of cache interferences.
منابع مشابه
Cache Awareness in Blocking Techniques
To date, data locality optimizing algorithms mostly aim at providing strategies for blocking and reordering loops. But little research has been devoted to the nal step: nding the optimal block size, i.e., a block size that provides the best possible performance. Optimal block sizes are currently computed as if a cache is a local memory, i.e., cache interferences are ignored. Case-studies have a...
متن کاملImprove Replica Placement in Content Distribution Networks with Hybrid Technique
The increased using of the Internet and its accelerated growth leads to reduced network bandwidth and the capacity of servers; therefore, the quality of Internet services is unacceptable for users while the efficient and effective delivery of content on the web has an important role to play in improving performance. Content distribution networks were introduced to address this issue. Replicatin...
متن کاملIn-Core Optimization of High-Order Stencil Computations
In this paper, we apply in-core optimization techniques to high-order stencil computations, including: (1) cache blocking for efficient L2 cache use; (2) register blocking and data-level parallelism via single-instruction multipledata (SIMD) techniques to increase L1 cache efficiency; and (3) software prefetching techniques. Our generic approach is tested with a kernel extracted from a 6 th -or...
متن کاملOn Effective Data Supply For Multi-Issue Processors
Emerging multi-issue microprocessors require effective data supply to sustain multiple instruction processing. The data cache structure, the backbone of data supply, has been organized and managed as one large homogenous resource, offering little flexibility for selective caching. While memory latency hiding techniques and multi-ported caches are critical to effective data supply, we show in th...
متن کاملDRAFT: Polynomial Multiplication: Blocking to Improve Cache Performance
We search for techniques to decrease the multiplication time for large sparse polynomials in Lisp by speeding up the sequential accesses of large vectors. We do this by utilizing blocking to improve cache performance, which we show to be effective for sufficiently large problems.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007